Skip to content

Add streaming TTFT (Time To First Token) support for LlamaIndex instrumentation#281

Open
shuningc wants to merge 2 commits intomainfrom
HYBIM-620-streaming-ttft
Open

Add streaming TTFT (Time To First Token) support for LlamaIndex instrumentation#281
shuningc wants to merge 2 commits intomainfrom
HYBIM-620-streaming-ttft

Conversation

@shuningc
Copy link
Copy Markdown
Contributor

@shuningc shuningc commented Apr 20, 2026

Introduces an EventHandler that listens to LlamaIndex instrumentation event system (LLMChatStartEvent/LLMChatInProgressEvent) to measure the time between an LLM request and the first streaming token. This metric is recorded as gen_ai.response.time_to_first_chunk on the LLM span.

The implementation bridges two systems:

  • CallbackHandler (fires on_event_start/on_event_end with event_id)
  • EventHandler (fires per-token with span_id)

A ContextVar correlates the callback event_id with the event span_id, and TTFTTracker calculates the delta.

Parent PR #274

…umentation

Introduces an EventHandler that listens to LlamaIndex instrumentation event
system (LLMChatStartEvent/LLMChatInProgressEvent) to measure the time between
an LLM request and the first streaming token. This metric is recorded as
gen_ai.response.time_to_first_chunk on the LLM span.

The implementation bridges two systems:
- CallbackHandler (fires on_event_start/on_event_end with event_id)
- EventHandler (fires per-token with span_id)

A ContextVar correlates the callback event_id with the event span_id, and
TTFTTracker calculates the delta.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@shuningc shuningc requested review from a team as code owners April 20, 2026 07:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants